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Maximally stable component detection is a very popular method for feature analysis in images, mainly due to its low computation 
cost and high repeatability. With the recent advance of feature-based methods in geometric shape analysis, there is significant 
interest in finding analogous approaches in the 3D world. In this paper, we formulate a diffusion-geometric framework for stable 
component detection in non-rigid 3D shapes, which can be used for geometric feature detection and description. A quantitative 
evaluation of our method on the SHREC 10 feature detection benchmark shows its potential as a source of high-quality features. 
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1. Introduction 

Over the past decade, feature-based methods have become 
a ubiquitous tool in image analysis and a de facto standard in 
many computer vision and pattern recognition problems. More 
recently, there has been an increased interest in developing sim- 
ilar methods for the analysis of 3D shapes. Feature descrip- 
tors play an important role in many shape analysis applications, 
such as finding shape correspondence [31] or assembling frac- 
tured models ll ill in computational aracheology. Bags of fea- 



tures 11281 l24l 13211 and similar approaches 112 111 were introduced 
as a way to construct global shape descriptors that can be effi- 
ciently used for large-scale shape retrieval. 

Many shape feature detectors and descriptors draw inspira- 
tion from and follow analogous methods in image analysis. For 
example, detection of geometric structures analogous to cor- 
ners [27] and edges [14]] in images has been studied. The his- 
togram of intrinsic gradients used in l3~5ll is similar in principle 
to the scale invariant feature transform (SIFT) [16] which has 
recently become extremely popular in image analysis. In Jioll . 
the integral invariant signatures lfl7ll successfully employed in 
2D shape analysis were extended to 3D shapes. 

Examples of 3D-specific descriptors include the popular spin 
image 11211 . based on representation of the shape normal field in 
a local system of coordinates. Recent studies introduced versa- 
tile and computationally efficient descriptors based on the heat 
kernel 11301 [3D describing the local heat propagation properties 
on a shape. The advantage of these methods is the fact that heat 
diffusion geometry is intrinsic and thus deformation-invariant, 
which makes descriptors based on it applicable in deformable 
shape analysis. 

1.1. Related work 

A different class of feature detection methods tries to find sta- 
ble components or regions in the analyzed image or shape. In 
the image processing literature, the watershed transform is the 



recursor of many algorithms for stable component detection 



3311 . In the computer vision and image analysis community, 
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stable component detection is used in the maximally stable ex- 
tremal regions (MSER) algorithm II 1811 . MSER represents in- 
tensity level sets as a component tree and attempts finding level 
sets with the smallest area variation across intensity; the use of 
area ratio as the stability criterion makes this approach affine- 
invariant, which is an important property in image analysis, as 
it approximates viewpoint transformations. Alternative stabil- 
ity criteria based on geometric scale-space analysis have been 
recently proposed in II 1 311 - 

In the shape analysis community, shape decomposition into 
characteristic primitive elements was explored in [ 22] . Methods 
similar to MSER have been explored in the works on topolog- 
ical persistence l8f|. P ersistence-based clustering [4] was used 
by Skraba et al. i29ll to perform shape segmentation. In J7fl, 
Digne et al. extended the notion of vertex- weighted component 
trees to meshes and proposed to detect MSER regions using the 
mean curvature. The approach was tested only in a qualitative 
way, and not evaluated as a feature detector. 

1.2. Main contribution 

The main contribution of our framework is three-fold. First, 
in Section 2 we introduce a generic framework for stable 
component detection, which unites vertex- and edge-weighted 
graph representations (as opposed to vertex-weighting used 
in image and shape maximally stable component detectors 
11181 LZD). Our results (see Section 4) show that the edge- 
weighted formulation is more versatile and outperforms its 
vertex-weighted counterpart in terms of feature repeatability. 
Second, in Section 3 we introduce diffusion geometric weight- 
ing functions suitable for both vertex- and edge-weighted com- 
ponent trees. We show that such functions are invariant un- 
der a large class of transformations, in particular, non-rigid in- 
elastic deformations, making them especially attractive in non- 
rigid shape analysis. We also show several ways of construct- 
ing scale-invariant weighting functions. Third, in Section 4 we 
show a comprehensive evaluation of different settings of our 
method on a standard feature detection benchmark comprising 
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shapes undergoing a variety of transformations (also see Fig- 
ure sQ] and [2}. 



2. Diffusion geometry 

Diffusion geometry is an umbrella term referring to geomet- 
ric analysis of diffusion or random walk processes. We models 
a shape as a compact two-dimensional Riemannian manifold X. 
In it simplest setting, a diffusion process on X is described by 
the partial differential equation 



(1) 



called the heat equation, where A denotes the positive- 
semidefinite Laplace-Beltrami operator associated with the Rie- 
mannian metric of X. The heat equation describes the propaga- 
tion of heat on the surface and its solution fit , x) is the heat dis- 
tribution at a point x in time t. The initial condition of the equa- 
tion is some initial heat distribution /(0, x); if X has a boundary, 
appropriate boundary conditions must be added. 

The solution of ([TJ corresponding to a point initial condition 
/(0, x) = 5{x, y), is called the heat kernel and represents the 
amount of heat transferred from x to y in time t due to the dif- 
fusion process. The value of the heat kernel h,(x,y) can also 
be interpreted as the transition probability density of a random 
walk of length t from the point x to the point y. 

Using spectral decomposition, the heat kernel can be repre- 
sented as 



h t {x,y) = JV^W&Cy)- 

!>0 



(2) 



Here, 0,- and Aj denote, respectively, the eigenfunctions and 
eigenvalues of the Laplace-Beltrami operator satisfying A</>,- = 
Ai<pi (without loss of generality, we assume A, to be sorted in 
increasing order starting with Ao = 0). Since the Laplace- 
Beltrami operator is an intrinsic geometric quantity, i.e., it can 
be expressed solely in terms of the metric of X, its eigenfunc- 
tions and eigenvalues as well as the heat kernel are invariant 
under isometric transformations (bending) of the shape. 

The parameter t can be given the meaning of scale, and the 
family [h t } t of heat kernels can be thought of as a scale-space of 
functions on X. By integrating over all scales, a scale-invariant 
version of (f2]i is obtained, 



c(x,y) = V ^-(pi(x)(pi(y) 



(3) 



This kernel is referred to as the commute-time kernel and can 
be interpreted as the transition probability density of a random 
walk of any length. 

By setting y — x, both the heat and the commute time kernels, 
h,(x, x) and c(x, x) express the probability density of remain- 
ing at a point x, respectively after time t and after any time. 
The value h t (x, x), sometimes referred to as the auto-diffusivity 
function, is related to the Gaussian curvature K(x) through 



h,(x, x) » — (l + -K(x)t + Oit 2 ) 
4-nt \ 6 



This relation coincides with the well-known fact that heat tends 
to diffuse slower at points with positive curvature, and faster at 
points with negative curvature. 

For any t > 0, the values of h,(x, y) at every x and y e B £ {x) in 
a small neighborhood around x contain full information about 
the intrinsic geometry of the shape. Furthermore, Sun et al. i30ll 
show that under mild technical conditions, the set {h,(x, x)} t> o 
is also fully informative (note that the auto-diffusivity function 
has to be evaluated at all values of t in order to contain full 
information about the shape metric). 

2.1. Numerical computation 

In the discrete setting, we assume that the shape is sampled at 
a finite number of points V = {vi, . . . , v^}, upon which a simpli- 
cial complex (triangular mesh) with vertices V, edges E c VxV 
and faces FcVxVxVis constructed. The computation of 
the discrete heat kernel h t (y i,\>2) and the associated diffusion 
geometry constructs is performed using formula ©, in which a 
finite number of eigenvalues and eigenfunctions of the discrete 
Laplace-Beltrami operator are taken. The latter can be com- 
puted directly using the finite elements method (FEM) 02611 . of 
by discretization of the Laplace operator on the mesh followed 
by its eigendecomposition. Here, we adopt the second approach 
according to which the discrete Laplace-Beltrami operator is 
expressed in the following generic form, 



(A x f) i = -J]w ij (f i -fj), 



(5) 



where f = /(v,-) is a scalar function defined on V, Wij are 
weights, and a, are normalization coefficients. In matrix no- 
tation, <j5j can be written as Axf = A~ l Wf, where / is an 
N x 1 vector, A = diag(a,) and W = diag(2 w wg) - (wy). 
The discrete eigenfunctions and eigenvalues are found by solv- 
ing the generalized eigendecomposition lfl5ll W<f> — AOA, 
where A = diag(/l;) is a diagonal matrix of eigenvalues and 
<E> = (0/(v,)) is the matrix of the corresponding eigenvectors. 

Different choices of A and W have been studied, depending 
on which continuous properties of the Laplace-Beltrami oper- 
ator one wishes to preserve 34]. For triangular meshes, a 
popular choice adopted in this paper is the cotangent weight 
scheme |25, 3, in which 



(cot ay + coty8, 7 )/2 
"'"1 



(vi,Vj) e E; 
else, 



(6) 



(4) 



where ay and /?y are the two angles opposite to the edge be- 
tween vertices v,- and vj in the two triangles sharing the edge, 
and a, are the discrete area elements. 



3. Maximally stable components 

Let us now focus on the undirected graph with the vertex 
set V and edge set E underlying the discretization of a shape, 
which with some abuse of notation we will henceforth denote 
as X = (V, E). We say that two vertices vi and V2 are adjacent if 
(vi, v'2) e E. An ordered sequence n = {vi, . . . , VjJ of vertices is 



called a path if for any i = 1, . . . ,k — 1, V; is adjacent to v,+i. In 
this case, we say that v\ and are linked in X. The graph is said 
to be connected if every pair of vertices in it is linked. A graph 
Y = (V c V,E' c E) is called a subgraph of X and denoted 
by 7 c X. We say that Y is a (connected) component of X if F 
is a connected subgraph of X that is maximal for this property 
(i.e., for any connected subgraph Z, Y c Z c X implies Y - Z). 
Given E' c E, the graph induced by E' is the graph Y = (V, E') 
whose vertex set is made of all vertices belonging to an edge in 
£", i.e., V = {v € V : 3v' € V, (v, v') e £"}. 

A scalar function / : V — » R is called a verfex weight, and 
a graph equipped with it is called vertex-weighted. Similarly, a 
graph equipped with a function d : E — > R defined on the edge 
set is called edge-weighted. In what follows, we will assume 
both types of weights to be non-negative. Grayscale images are 
often represented as vertex-weighted graphs with some regular 
(e.g., four-neighbor) connectivity and weights corresponding to 
the intensity of the pixels. Edge weights can be obtained, for ex- 
ample, by considering a local distance function measuring the 
dissimilarity of pairs of adjacent pixels. While vertex weighting 
is limited to scalar (grayscale) images, edge weighting is more 
general. 

3.1. Component trees 

Let (X, f) be a vertex- weighted graph. For t > 0, the {-cross- 
section of X is defined as the graph induced by Ee - {(vi, V2) e 
E : f{v]),f{v2) < {}■ Similarly, a cross-section of an edge- 
weighted graph (X, d) is induced by the edge subset E( — \e e 
E : d(e) < I}. A connected component of the cross-section is 
called an £-level set of the weighted graph. 

For any component C of X, we define the altitude ((C) as the 
minimal t for which C is a component of the ^-cross-section of 
X. Altitudes establish a partial order relation on the connected 
components of X as any component C is contained in a com- 
ponent with higher altitude. The set of all such pairs (((C), C) 
therefore forms a tree called the component tree. Note that the 
above definitions are valid for both vertex- and edge-weighted 
graphs. 



In other words, the more the area of a component changes with 
the change of (, the less stable it is. A component Cf is called 
maximally stable if the instability function has a local minimum 
at (*. Maximally stable components are widely known in the 
computer vision literature under the name of maximally stable 
extremal regions or MSERs for short [18], with s((*) usually 
referred to as the region score. 

It is important to note that in their original definition, MSERs 
were defined on a component tree of a vertex-weighted graph, 
while our definition is more general and allows for edge- 
weighted graphs as well. The importance of such an extension 
will become evident in the sequel. Also, the original MSER 
algorithm IU8I1 assumes the vertex weights to be quantized, 
while our formulation is suitable for scalar fields whose dy- 
namic range is unknown a priori. 

3.3. Computational aspects 

We use the quasi-linear time algorithm detailed in ll23ll for 
the construction of vertex-weighted component trees, and its 
straightforward adaptation to the edge-weighted case. The al- 
gorithm is based on the observation that the vertex set V can be 
partitioned into disjoint sets which are merged together as one 
goes up in the tree. Maintaining and updating such a partition 
can be performed very efficiently using the union-find algorithm 
and related data structures. The resulting tree construction com- 
plexity is 0(N log log AO. 

The derivative © of the component area with respect to C 
constituting the stability function is computed using finite dif- 
ferences in each branch of the tree. For example, in a branch 
Q,cQc...cQ r 



A(C (k J-A(C [k _,) 



(9) 



(k+l - ik-l 

The function is evaluated and its local minima are detected in 
a single pass over the branches of the component tree starting 
from the leaf nodes. We further filter out maximally stable re- 
gions with too high values of s. In cases where two nested 
regions overlapping by more that a predefined threshold are de- 
tected as maximally stable, only the bigger one is kept. 



3.2. Maximally stable components 

Since in our discussion undirected graphs are used as a dis- 
cretization of smooth manifolds, we can associate with every 
component C (or every subset of the vertex set in general) a 
measure of area, A(C). In the simplest setting, the area of C 
can be thought of as its cardinality. In a better discretization, 
each vertex v in the graph is associated with a discrete area ele- 
ment da(v), and the area of a component is defined as 

A(C) = Yj da{v) - (7) 

veC 

Let now {((, Q)} be a sequence of nested components form- 
ing a branch in the component tree. We define the instability of 
C { as 

m - «> 

at 



4. Weighting functions 

Unlike images where methods based on the analysis of the 
component tree have been shown to be extremely success- 
ful e.g. for segmentation or affine-invariant feature detection 
(namely, the MSER feature detector), similar techniques have 
been only scarcely explored for 3D shapes (with the notable 
exceptions of Q] and O^l - ). One of possible reasons is the fact 
that while images readily offer pixel intensities as the trivial 
vertex weight field, 3D shapes are not generally equipped with 
any such field. While the use of the mean curvature was pro- 
posed in [7], it lacks most of invariance properties required in 
deformable shape analysis. Here, we follow ll29ll in adopting 
the diffusion geometry framework and show that it allows to 
construct both vertex and edge weights suitable for the defini- 
tion of maximally stable components with many useful proper- 
ties. 



Given a vertex v, the values of the discrete auto-diffusivity 
function can be directly used as the vertex weights, 



/(v) = h t (v,v). 



(10) 



Maximally stable components defined this way are intrinsic 
and, thus, invariant to non-rigid bending. Such strong invari- 
ance properties are particularly useful in the analysis of de- 
formable shapes. However, unlike images where the inten- 
sity field contains all information about the image, the above 
weighting function does not describe the intrinsic geometry of 
the shape entirely. It furthermore depends on the selection of 
the scale parameter t. 

Edge weights constitute a more flexible alternative allowing 
to incorporate fuller geometric information. The simplest edge 
weighting scheme can be obtained from a vector-valued field 
defined on the vertices of the graph. For example, associating 
h t (v, v) for t e [t i , t 2 ] with each vertex v, one can define an edge 
weighting function 



d(vuv 2 ) = Pt(vi,vi) - h t (v2,v 2 )\\t 



If 



(h,(vi,vi) -h,(v 2 ,v 2 )) dt 



1/2 



(ID 



(here, we write || ■ ||, to make explicit that the norm is taken 
with respect to the variable t). The function has a closed-form 
expression that can be obtained by substituting the spectral de- 
composition (0 of the heat kernel. The advantage of this ap- 
proach stems from its ability to incorporate multiple scales. 
Theoretically, the set of h,(v, v) at all scales contains full in- 
formation about the intrinsic geometry of the shape. 

In a more general setting, edge weights do not necessarily 
need to stem from any finite- or infinite-dimensional vector field 
defined on the vertices. For example, since the discrete heat 
kernel h t (vi,v 2 ) represents "proximity" between vi and v 2 , a 
function inversely proportional to the value of the heat kernel, 



e.g. 



d(v u v 2 ) = 



1 



(12) 



ht(v\, v 2 ) 

can be used as an edge weight. For sufficiently small values of 
t, this function also contains full information about the shape's 
intrinsic geometry. 

Another way of creating edge weights inversely proportional 
to h t is by integrating the squared difference between the kernels 
centered at vi and v 2 over the entire shape, 



d(v u v 2 ) 



h t (vu ■) - h,(v 2 , 



YfhAn^-htiv^vrfdaiv) 



VveV 



1/2 



(13) 



This construction has been previously introduced in [5] under 
the name of diffusion distance, which constitutes an intrinsic 
metric on X and is fully informative for small f's. 

4.1. Scale invariance 

The vertex weighting function ( TTOb and the edge weighting 
functions ( ITTb . ( TTZb and ( fT3l l based on the heat kernel are not 



scale invariant since a global scaling of the shape by a factor 
y > influences the heat kernel as y 2 h y 2,(vi, v 2 ), scaling by y 1 
both the time parameter and the kernel itself. A possible rem- 
edy is to replace the heat kernel by the scale invariant commute 
time kernel. However, due to the slow decay of the expansion 
coefficients Aj 1 in (0 compared to e~ Ai ' in (0, the numerical 
computation of the commute time kernel is more difficult as it 
requires many more eigenfunctions of the Laplacian to achieve 
the same accuracy. 

As an alternative, it is possible to use a sequence of transfor- 
mations of h t (vi, v 2 ) that renders it scale invariant 01 . First, the 
heat kernel is sampled logarithmically in time. Next, the log- 
arithm and a derivative with respect to time of the heat kernel 
values are taken to undo the multiplicative constant. Finally, 
taking the magnitude of the Fourier transform allows to undo 
the scaling of the time variable. This yields the modified heat 
kernel of the form 



hjy x ,v 2 ) = 



(14) 



where co denotes the frequency variable of the Fourier trans- 
form. The transform is computed numerically using the FFT 
as detailed in yfl. Substituting h (d into (TTTb— (TT2b yields scale 
invariant edge weighting functions]! By selecting a single fre- 
quency a>, one can construct a scale invariant vertex weight 
/(v) = h^iy, v) similar to ( TTOb . Another way of constructing 
a scale invariant vertex weight is by integrating h^ over a rage 
of frequencies, e.g., 



/(v) 



\\K(v,v) 



IT 



hJv, v)du 



1/2 



(15) 



5. Descriptors 



5.1. Point descriptors 

Once the regions are detected, their content can be described 
using any standard point-wise descriptor of the form a : V — » 
R. 9 . In particular, here we consider point-wise heat kernel de- 
scriptors proposed in ll30Tl . The heat kernel descriptor (or heat 
kernel signature, HKS) is computed by taking the values of the 
discrete auto-diffusivity function at vertex v at multiple times, 
a(v) = (h tl (v, v), . . . , h t (v, v)), where t\,...,t q are some fixed 
time values. Such a descriptor is a vector of dimensionality q 
at each point. Since the heat kernel is an intrinsic quantity, the 
HKS is invariant to isometric transformations of the shape. 

A scale-invariant version of the HKS descriptor (SI-HKS) 
can be obtained as proposed [3J by replacing h, with h w 



3 Since the component inclusion relations giving rise to component tree are 
invariant to any monotonous transformation of the weighting functions, it is 
sufficient to undo just undoing the scaling of the time parameter t without un- 
doing the scaling of the kernel itself. However, such a transformation affects 
the scores of the detected regions. We found that the logarithmic transforma- 
tion and derivative improve repeatability. Furthermore, by completely undoing 
the effect of scaling, the modified heat kernel can be used both in the weighting 
function and in descriptors of the maximally stable components as detailed in 
the following section. 
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Figure 1: Maximally stable regions detected on different shapes from the TOSCA dataset. Note the invariance of the regions to strong non-rigid deformations. 
Also observe the similarity of the regions detected on the female shape and the upper half of the centaur (compare to the male shape from Figure|2j- Regions were 
detected using h t (v, v) as vertex weight function, with r = 2048. 



from (TBI) , yielding or(v) = (h m (v, v), . . . , (v, v)), where 6. Results 



a>\,...,oj q are some fixed frequency values. In the follow- 
ing experiments, the heat kernel was sampled at time values 
t — 2 1 , 2 1+1 / 16 , . . . , 2 25 . The first six discrete frequencies of the 
Fourier transform were taken, repeating the settings of 



5.2. Region descriptors 

Given a descriptor a(v) at each vertex v e V, the simplest 
way to define a region descriptor of a component C c V is by 
computing the average of a in C, 



(16) 



The resulting region descriptor /3(C) is a vector of the same di- 
mensionality q as the point descriptor a. 

An alternative construction considered here follows Ovs- 
janikov et al. [24] where a global shape descriptors were ob- 
tained from point-wise descriptors using the bag of features 
paradigm [28]. In this approach, a fixed "geometric vocabu- 
lary" ac\ , . . . , a p is computed by means of an off-line clustering 
of the descriptor space. Next, each point descriptor at v is rep- 
resented in the vocabulary using vector quantization, yielding a 
point-wise p-dimensional distribution of the form 



6(v) oc e 



■\\a(v)-a,\\ 2 /lo 2 



(17) 



The distribution is normalized in such a way that the elements 
of 6(v) sum to one. In the case of <x = 0, hard vector quantiza- 
tion is used, and 6*/(v) = 1 for a\ being the closest element of the 
geometric vocabulary to a(v) in the descriptor space, and zero 
elsewhere. Given a component C, we can define a local bag of 
features by computing the distribution of geometric words over 
the region, 



f3(C) = Y i mda(v). 



(18) 



Such a bag of features is used as a region descriptor of dimen- 
sionality p. 



6.1. Dataset 

The proposed approach was tested on the data of the 
SHREC'10 feature detection and description benchmark J2]. 
The SHREC dataset consisted of three shape classes, with sim- 
ulated transformations applied to them. Shapes are represented 
as triangular meshes with approximately 10,000 to 50,000 ver- 
tices. In our experiments, all meshes were downsampled to at 
most 10,000 vertices. Each shape class contained nine cate- 
gories of transformations: isometry (non-rigid almost inelastic 
deformations), topology (welding of shape vertices resulting in 
different triangulation), micro holes and big holes simulating 
missing data and occlusions, global and local scaling, additive 
Gaussian noise, shot noise, and downsampling (less than 20% 
of the original points). In transformation appeared in five differ- 
ent strengths. Vertex-wise correspondence between the trans- 
formed and the null shapes was given and used as the ground 
truth in the evaluation of region detection repeatability. Since 
all shapes exhibit intrinsic bilateral symmetry, best results over 
the groundtruth correspondence and its symmetric counterpart 
were used. 

We also used several deformable shapes from the TOSCA 
dataset fljj] for a qualitative evaluation. 

6.2. Detector repeatability 

The evaluation of the proposed feature detector and descrip- 
tor followed the spirit of the influential work by Mikolajczyk 
et al. lEoll . In the first experiment, the repeatability of the de- 
tector was evaluated. Let X and Y be the null and the trans- 
formed version of the same shape, respectively. Let X\ , . . . , X m 
and Y\ , . . . , Y n denote the regions detected in X and Y, and let 
X'j be the image of the region Yj in X under the ground-truth 
correspondence^ Given two regions Xj and Yj, their overlap is 
defined as the following area ratio 



0(Xi,X'j) = 



A(XinX'p 

A(X,UX'p 



A(X,nX'p 



AiXd+AiX'p-AiXiHX'p 



(19) 



4 As some of the transformed shapes had missing data compared to the null 
shape, comparison was defined single-sidedly. Only regions in the transformed 
shape that had no corresponding regions in the null counterpart decreased the 
overlap score, while unmatched regions of the null shape did not. 



5 



Figure 2: Maximally stable regions detected on shapes from the SHREC 10 dataset using the vertex weight h,(v, v) with t = 2048. First row: different approximate 
isometries of the human shape. Second row: different transformations (left-to-right): holes, localscale, noise, shotnoise and scale. 



The repeatability at overlap o is defined as the percentage of re- 
gions in Y that have corresponding counterparts in X with over- 
lap greater than o l20ll . An ideal detector has the repeatability 
of 1. 

Four vertex weight functions were compared: discrete heat 
kernel (TTOb with t = 2048, commute time kernel (0, modified 
heat kernel with u> = 0, and the norm of the modified heat ker- 
nel ( Tl3T >. These four scalar fields were also used to construct 
edge weights according to d(vt, V2) = - f(vi)\- Further- 

more, since these kernels are functions of a pair of vertices, 
they were used to define edge weights according to d!2t . In 
addition, we also tested edge weights constructed according to 
(fTTI) and the diffusion distance ( fT3l >. Unless mentioned other- 
wise, t = 2048 was used for the heat kernel and co — for the 
modified heat kernel, as these settings turned out to give best 
performance on the SHREC 10 dataset. 

We first evaluated different region detectors qualitatively us- 
ing shapes from the SHREC 10 and the TOSCA datasets. Fig- 
ureQ]shows the regions detected using the vertex weight h,(v, v) 
with t = 2048 on a few sample shapes from the TOSCA 
dataset. Figures [2] and [5] depict the maximally stable compo- 
nents detected with the same settings on several shapes from the 
SHREC dataset. Figure[3] shows the regions obtained using the 
edge weighting function l/h,(v\, V2). In all cases, the detected 
regions appear robust and repeatable under the transformations. 
Surprisingly, many of these regions have a clear semantic inter- 
pretation. Moreover, similarly looking regions are detected on 
the male and female shapes, and the upper half of the centaur. 
This makes the proposed feature detector a good candidate for 




Figure 5 : The set of maximally stable regions extracted from one of the shapes 
in FigurefJ] 



partial shape matching and retrieval. 

In order to select optimal cutoff threshold of the instability 
function (i.e., the maximum region instability value that is still 
accepted by the detector), we estimated the empirical distribu- 
tions of the detecting regions as a function of the instability 
score and their overlap with the corresponding groundtruth re- 
gions. These histograms are depicted in Figure [4] An good de- 
tector should produce many regions with overlap close to 100% 
that have low instability, and produce as few as possible low- 
overlap regions that have very high instability that can be sepa- 
rated from the high-overlap regions by means of a threshold. In 
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Figure 3: Maximally stable regions detected on shapes from the SHREC 10 dataset using the edge weight l//i ( (vi, V2) with f = 2048. Region coloring is arbitrary. 



each of the tested detectors, the instability score threshold was 
selected to maximize the detection of high-overlap regions. 

Table Q] summarizes the repeatability of different weighting 
functions at overlap of 75%. Figures [6]and[7]depict the repeata- 
bility and the number of correctly matching regions as the func- 
tion of the overlap for the best four of the compared weighting 
functions. We conclude that scale-dependent weighting gen- 
erally outperform their scale-invariant counterparts in terms of 
repeatability. The four scalar fields corresponding to different 
auto-diffusivity functions perform well both when used as ver- 
tex and edge weights. Best repeatability is achieved by the edge 
weighting function l/h t (vi, V2). Best scale invariant weighting 
is also the edge weight l/c(vi, V2). 

6.3. Descriptor discriminativity 



In the second experiment, the discriminativity of region de- 
scriptors was evaluated by measuring the relation between dis- 
tance in the descriptor space and the overlap between the corre- 
sponding regions. 

Using the notation from the previous section, let F, be one of 
the n maximally stable components detected on a transformed 
shape Y, X[ its image on the null shape X under the ground 
truth correspondence, and let Xj denote one of the m maximally 
stable components detected on the null shape. A groundtruth 
relation between the regions is established by fixing a minimum 
overlap p = 0.75 and deeming F, and Xj matching if oy = 
0(X',Xj) > p. Let us now be given a region descriptor ft; for 
simplicity we assume the distance between the descriptors to 
be the standard Euclidean distance. By setting a threshold r on 
this distance, F, and Xj will be classified as positives if dij = 
||/3(Y/) -fi{Xj)\\ < t. We define the true positive rate as the ratio 



TPR = 



\{djj<T)\ _ 
lion >p}\' 



similarly, the false positive rate is defined as 
\{dij > t}\ 



FPR 



{Oij < p}\ ' 



(20) 



(21) 



A related quantity is the false negative rate defined as FNR 
I - TPR. By varying the threshold t, a set of pairs (FPR, TPR) 
referred to as the receiver operator characteristic (ROC) curve 
is obtained. The particular point on the ROC curve for which 
the false positive and false negative rates coincide is called 
equal error rate (EER). We use EER as a scalar measure for 
the descriptor discriminativity. Ideal descriptors have EER = 0. 

Another descriptor performance criterion used here considers 
the first matches produced by the descriptor distance. For that 
purpose, for each Xi we define its first match as the T/.(,) with 
j*(i) = aigmiiij dij (nearest neighbor of /3{Xj) in the descriptor 
space). The matching score is defined as the ratio of correct 
first matches for a given overlap p, 

'{Oij'd) > p}\ 



score(p) = 



(22) 



The following four weighting functions exhibiting best re- 
peatability scores in the previous experiment were used to de- 
fine region detectors: the edge weight 1 lh,(v\, vi) with t = 1024 
(absolute winner in terms of repeatability), the vertex weight 
ht(v,v) (second-best repeatability), its edge-weight counter- 
part \h t (vi, Vi) - h t (v2, V2)\ (gives lower repeatability scores but 
supplies almost twice correspondences), and the edge weight 
1 /c(vi, v%) (best scale invariant detector). Given the maximally 
stable components detected by a selected detector, region de- 
scriptors were calculated. We used two types of point descrip- 
tors: the heat kernel signature h t (v, v) sampled at six time val- 
ues t = 16,22.6,32,45.2,64,90.5,128, and its scale invariant 
version h w (v, v), for which we have taken the first six discrete 
frequencies of the Fourier transform (these are settings identical 
to 131]). These point descriptors were used to create region de- 
scriptors using averaging () and local bags of features (). Bags 
of features were tested with vocabulary sizes p = 10 and 12. 
Table [2] summarizes the performance in terms of EER of dif- 
ferent combinations of weighting functions and region descrip- 
tors. Figure [8] depicts the ROC curves of different descriptors 
of vertex- and edge-weighted maximally stable component de- 
tectors. 

Figures l9l-[T2l show the number of correct first matches and 
the matching score as a function of the overlap for different 
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Figure 4: Distributions of maximally stable components as a function of the overlap to the groundtruth regions and instability score. Left-to-right top-to-bottom are 
shown the following weighting function: vertex weight h t (v, v) at / = 2048, vertex weight c(v, v), edge weight \\h t (v[ , ■) — A,(V2, Ollx (diffusion-distance) at t = 2048, 
edge weight 1 lh t (v\ , vi) at t = 2048, vertex weight h a (y, v) at at = and edge weight \h t (yi , Vi) - h,(y2, \'2)\ alt = 2048. Good detectors are characterized by a large 
number of high-overlap stable regions (many regions in the upper left corner of the plot) that can be easily separated from the low-overlap regions that should be 
concentrated in the lower right corner. 



choices of weighting functions and descriptors. Examples of 
matching regions are depicted in Figure Qj] 

We conclude that the scale invariant HKS descriptor consis- 
tently exhibits higher performance in both the average and bag 
of features flavors. The latter flavors perform approximately the 
same. The HKS descriptor, on the other hand, performs better 
in the bag of feature setting, though never reaching the scores 
of SIHKS. Surprisingly, as can be seen from Figures I914T21 the 
SIHKS descriptor is consistently more discriminative even in 
transformations not including scaling. 



7. Conclusions 



We presented a generic framework for the detection of stable 
regions in non-rigid shapes. Our approach is based on the max- 
imization of a stability criterion in a component tree represen- 
tation of the shape with vertex or edge weights. Using diffusion 
geometric weighting functions allows obtaining a feature de- 
tection algorithm that is invariant to a wide class of shape trans- 
formations, in particular, non-rigid bending and global scaling, 
which makes our approach applicable in the challenging setting 
of deformable shape analysis. In followup studies, we are go- 
ing to explore the uses of the proposed feature detectors and 
descriptors in shape matching and retrieval problems. 
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Table 1: Repeatability of maximally stable components with different vertex and edge weighting functions. 



Weighting 


HKS 


HKS 


HKS 


SI-HKS 


SI-HKS 


SI-HKS 


function 


Avgerage 


BoF(p = 10) 


BoF(p = 12) 


Avgerage 


BoF(p = 10) 


BoF(p = 12) 


h,(v, v) 
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0.091 
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l/Af(Vl,V 2 ) 


0.304 
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0.281 
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0.093 


0.090 


\h,(vu\'i) - h,(v 2 ,v 2 )\ 


0.213 


0.212 


0.222 


0.085 


0.091 


0.094 


l/c(vi,v 2 ) 


0.260 


0.284 


0.294 


0.147 


0.157 


0.148 



Table 2: Equal error rate (EER) performance of different maximally stable component detectors and descriptors (t = 2048 was used in all cases), p denotes the 
vocabulary size in the bag of features region descriptors. 
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Figure 6: Repeatability of maximally stable components with the vertex weight h T (v, v) (first row) and edge weight 1 /h t (v\ , v^) (second row), t = 2048. 
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Figure 7: Repeatability of maximally stable components with the edge weight \h t (\>i , Vi) — htfyi, v%)\ (first row) and edge weight l/c(vi , V2) (second row), t = 2048. 
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Figure 8: ROC curves of different regions descriptors ("vs" stands for vocabulary size). The following detectors were used (left-to-right, top-to-bottom): vertex 
weight h,(v, v), edge weight 1 jh t (v\ , V2), edge weight |/»<(vi , vi ) - h,(v2, V2)\, and edge weight 1 /c(vi , V2). 
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Figure 9: Performance of region descriptors with regions detected using the vertex weight h,(v, v), t = 2048. Shown are the HKS descriptor (first row) and SI-HKS 
descriptor (second row). 
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Figure 10: Performance of region descriptors with regions detected using the edge weight l/h,(V[,V2), t = 2048. Shown are the HKS descriptor (first row) and 
SI-HKS descriptor (second row). 
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Figure 11: Performance of region descriptors with regions detected using the edge weight \ht(y\,v\) — h t (,V2, vj) , ( = 2048. Shown are the HKS descriptor (first 
row) and SI-HKS descriptor (second row). 
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Figure 12: Performance of region descriptors with regions detected using the edge weight l/c(vi,V2). Shown are the HKS descriptor (first row) and SI-HKS 
descriptor (second row). 
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Figure 13: Examples of closest matches found for different query regions on the TOSCA dataset. Shown from left to right are: query, 1st, 2nd, 4th, 10th, and 14th 
matches. Vertex weight h,(v, v) with t = 2048 was used as the detector; average SIHKS was used as the descriptor. 
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